home *** CD-ROM | disk | FTP | other *** search
- Subject: v14i081: Flex, a lex replacement, Part03/05
- Newsgroups: comp.sources.unix
- Sender: sources
- Approved: rsalz@uunet.UU.NET
-
- Submitted-by: Vern Paxson <vern@lbl-csam.arpa>
- Posting-number: Volume 14, Issue 81
- Archive-name: flex/part03
-
- #! /bin/sh
- # This is a shell archive. Remove anything before this line, then unpack
- # it by saving it into a file and typing "sh file". To overwrite existing
- # files, type "sh file -c". You can also feed this as standard input via
- # unshar, or by typing "sh <file", e.g.. If this archive is complete, you
- # will see the following message at the end:
- # "End of archive 3 (of 5)."
- # Contents: flex.1 flexdef.h nfa.c
- # Wrapped by rsalz@fig.bbn.com on Tue May 3 17:31:30 1988
- PATH=/bin:/usr/bin:/usr/ucb ; export PATH
- if test -f 'flex.1' -a "${1}" != "-c" ; then
- echo shar: Will not clobber existing file \"'flex.1'\"
- else
- echo shar: Extracting \"'flex.1'\" \(15558 characters\)
- sed "s/^X//" >'flex.1' <<'END_OF_FILE'
- X.TH FLEX 1 "13 May 1987"
- X.SH NAME
- flex - fast lexical analyzer generator
- X.SH SYNOPSIS
- X.B flex
- X[
- X.B -dfirstvFILT -c[efmF] -Sskeleton_file
- X] [
- X.I filename
- X]
- X.SH DESCRIPTION
- X.I flex
- is a rewrite of
- X.I lex
- intended to right some of that tool's deficiencies: in particular,
- X.I flex
- generates lexical analyzers much faster, and the analyzers use
- smaller tables and run faster.
- X.SH OPTIONS
- In addition to lex's
- X.B -t
- flag, flex has the following options:
- X.TP
- X.B -d
- makes the generated scanner run in
- X.I debug
- mode. Whenever a pattern is recognized the scanner will
- write to
- X.I stderr
- a line of the form:
- X.nf
- X
- X --accepting rule #n
- X
- X.fi
- Rules are numbered sequentially with the first one being 1.
- X.TP
- X.B -f
- has the same effect as lex's -f flag (do not compress the scanner
- tables); the mnemonic changes from
- X.I fast compilation
- to (take your pick)
- X.I full table
- or
- X.I fast scanner.
- The actual compilation takes
- X.I longer,
- since flex is I/O bound writing out the big table.
- X.IP
- This option is equivalent to
- X.B -cf
- X(see below).
- X.TP
- X.B -i
- instructs flex to generate a
- X.I case-insensitive
- scanner. The case of letters given in the flex input patterns will
- be ignored, and the rules will be matched regardless of case. The
- matched text given in
- X.I yytext
- will have the preserved case (i.e., it will not be folded).
- X.TP
- X.B -r
- specifies that the scanner uses the
- X.B REJECT
- action.
- X.TP
- X.B -s
- causes the
- X.I default rule
- X(that unmatched scanner input is echoed to
- X.I stdout)
- to be suppressed. If the scanner encounters input that does not
- match any of its rules, it aborts with an error. This option is
- useful for finding holes in a scanner's rule set.
- X.TP
- X.B -v
- has the same meaning as for lex (print to
- X.I stderr
- a summary of statistics of the generated scanner). Many more statistics
- are printed, though, and the summary spans several lines. Most
- of the statistics are meaningless to the casual flex user.
- X.TP
- X.B -F
- specifies that the
- X.ul
- fast
- scanner table representation should be used. This representation is
- about as fast as the full table representation
- X.ul
- X(-f),
- and for some sets of patterns will be considerably smaller (and for
- others, larger). In general, if the pattern set contains both "keywords"
- and a catch-all, "identifier" rule, such as in the set:
- X.nf
- X
- X "case" return ( TOK_CASE );
- X "switch" return ( TOK_SWITCH );
- X ...
- X "default" return ( TOK_DEFAULT );
- X [a-z]+ return ( TOK_ID );
- X
- X.fi
- then you're better off using the full table representation. If only
- the "identifier" rule is present and you then use a hash table or some such
- to detect the keywords, you're better off using
- X.ul
- X-F.
- X.IP
- This option is equivalent to
- X.B -cF
- X(see below).
- X.TP
- X.B -I
- instructs flex to generate an
- X.I interactive
- scanner. Normally, scanners generated by flex always look ahead one character
- before deciding that a rule has been matched. At the possible cost of some
- scanning overhead (it's not clear that more overhead is involved), flex will
- generate a scanner which only looks ahead when needed. Such scanners are
- called
- X.I interactive
- because if you want to write a scanner for an interactive system such
- as a command shell, you will probably want the user's input to be terminated
- with a newline, and without
- X.B -I
- the user will have to type a character in addition to the newline in order
- to have the newline recognized. This leads to dreadful interactive performance.
- X.IP
- If all this seems to confusing, here's the general rule: if a human will
- be typing in input to your scanner, use
- X.B -I,
- otherwise don't; if you don't care about how fast your scanners run and
- don't want to make any assumptions about the input to your scanner,
- always use
- X.B -I.
- X.IP
- Note,
- X.B -I
- cannot be used in conjunction with
- X.I full
- or
- X.I fast tables,
- i.e., the
- X.B -f, -F, -cf,
- or
- X.B -cF
- flags.
- X.TP
- X.B -L
- instructs flex to not generate
- X.B #line
- directives (see below).
- X.TP
- X.B -T
- makes flex run in
- X.I trace
- mode. It will generate a lot of messages to standard out concerning
- the form of the input and the resultant non-deterministic and deterministic
- finite automatons. This option is mostly for use in maintaining flex.
- X.TP
- X.B -c[efmF]
- controls the degree of table compression.
- X.B -ce
- directs flex to construct
- X.I equivalence classes,
- i.e., sets of characters
- which have identical lexical properties (for example, if the only
- appearance of digits in the flex input is in the character class
- X"[0-9]" then the digits '0', '1', ..., '9' will all be put
- in the same equivalence class).
- X.B -cf
- specifies that the
- X.I full
- scanner tables should be generated - flex should not compress the
- tables by taking advantages of similar transition functions for
- different states.
- X.B -cF
- specifies that the alternate fast scanner representation (described
- above under the
- X.B -F
- flag)
- should be used.
- X.B -cm
- directs flex to construct
- X.I meta-equivalence classes,
- which are sets of equivalence classes (or characters, if equivalence
- classes are not being used) that are commonly used together.
- A lone
- X.B -c
- specifies that the scanner tables should be compressed but neither
- equivalence classes nor meta-equivalence classes should be used.
- X.IP
- The options
- X.B -cf
- or
- X.B -cF
- and
- X.B -cm
- do not make sense together - there is no opportunity for meta-equivalence
- classes if the table is not being compressed. Otherwise the options
- may be freely mixed.
- X.IP
- The default setting is
- X.B -cem
- which specifies that flex should generate equivalence classes
- and meta-equivalence classes. This setting provides the highest
- degree of table compression. You can trade off
- faster-executing scanners at the cost of larger tables with
- the following generally being true:
- X.nf
- X
- X slowest smallest
- X -cem
- X -ce
- X -cm
- X -c
- X -c{f,F}e
- X -c{f,F}
- X fastest largest
- X
- X.fi
- X.TP
- X.B -Sskeleton_file
- overrides the default skeleton file from which flex constructs
- its scanners. You'll never need this option unless you are doing
- flex maintenance or development.
- X.SH INCOMPATIBILITIES WITH LEX
- X.I flex
- is fully compatible with
- X.I lex
- with the following exceptions:
- X.IP -
- There is no run-time library to link with. You needn't
- specify
- X.I -ll
- when linking, and you must supply a main program. (Hacker's note: since
- the lex library contains a main() which simply calls yylex(), you actually
- X.I can
- be lazy and not supply your own main program and link with
- X.I -ll.)
- X.IP -
- lex's
- X.B %r
- X(Ratfor scanners) and
- X.B %t
- X(translation table) options
- are not supported.
- X.IP -
- The do-nothing
- X.ul
- X-n
- flag is not supported.
- X.IP -
- When definitions are expanded, flex encloses them in parentheses.
- With lex, the following
- X.nf
- X
- X NAME [A-Z][A-Z0-9]*
- X %%
- X foo{NAME}? printf( "Found it\\n" );
- X %%
- X
- X.fi
- will not match the string "foo" because when the macro
- is expanded the rule is equivalent to "foo[A-Z][A-Z0-9]*?"
- and the precedence is such that the '?' is associated with
- X"[A-Z0-9]*". With flex, the rule will be expanded to
- X"foo([A-z][A-Z0-9]*)?" and so the string "foo" will match.
- X.IP -
- X.B yymore()
- is not supported.
- X.IP -
- The undocumented lex-scanner internal variable
- X.B yylineno
- is not supported.
- X.IP -
- If your input uses
- X.B REJECT,
- you must run flex with the
- X.B -r
- flag. If you leave out the flag, the scanner will abort at run-time
- with a message that the scanner was compiled without the flag being
- specified.
- X.IP -
- The
- X.B input()
- routine is not redefinable, though may be called to read characters
- following whatever has been matched by a rule. If
- X.B input()
- encounters and end-of-file the normal
- X.B yywrap()
- processing is done. A ``real'' end-of-file is returned as
- X.I EOF.
- X.IP
- Input can be controlled by redefining the
- X.B YY_INPUT
- macro.
- YY_INPUT's calling sequence is "YY_INPUT(buf,result,max_size)". Its
- action is to place up to max_size characters in the character buffer "buf"
- and return in the integer variable "result" either the
- number of characters read or the constant YY_NULL (0 on Unix systems)
- systems) to indicate EOF. The default YY_INPUT reads from the
- file-pointer "yyin" (which is by default
- X.I stdin),
- so if you
- just want to change the input file, you needn't redefine
- YY_INPUT - just point yyin at the input file.
- X.IP
- A sample redefinition of YY_INPUT (in the first section of the input
- file):
- X.nf
- X
- X %{
- X #undef YY_INPUT
- X #define YY_INPUT(buf,result,max_size) \\
- X result = (buf[0] = getchar()) == EOF ? YY_NULL : 1;
- X %}
- X
- X.fi
- You also can add in things like counting keeping track of the
- input line number this way; but don't expect your scanner to
- go very fast.
- X.IP -
- X.B output()
- is not supported.
- Output from the ECHO macro is done to the file-pointer
- X"yyout" (default
- X.I stdout).
- X.IP -
- Trailing context is restricted to patterns which have either
- a fixed-sized leading part or a fixed-sized trailing part.
- XFor example, "a*/b" and "a/b*" are okay, but not "a*/b*".
- This restriction is due to a bug in the trailing context
- algorithm given in
- X.I Principles of Compiler Design
- X(and
- X.I Compilers - Principles, Techniques, and Tools)
- which can result in mismatches. Try the following lex program
- X.nf
- X
- X %%
- X x+/xy printf( "I found \\"%s\\"\\n", yytext );
- X
- X.fi
- on the input "xxy". (If anyone knows of a fast algorithm for
- finding the beginning of trailing context for an arbitrary
- pair of regular expressions, please let me know!)
- If you must have arbitrary trailing context, you can use
- X.B yyless()
- to effect it.
- X.IP -
- flex reads only one input file, while lex's input is made
- up of the concatenation of its input files.
- X.SH ENHANCEMENTS
- X.IP -
- X.I Exclusive start-conditions
- can be declared by using
- X.B %x
- instead of
- X.B %s.
- These start-conditions have the property that when they are active,
- X.I no other rules are active.
- Thus a set of rules governed by the same exclusive start condition
- describe a scanner which is independent of any of the other rules in
- the flex input. This feature makes it easy to specify "mini-scanners"
- which scan portions of the input that are syntactically different
- from the rest (e.g., comments).
- X.IP -
- flex dynamically resizes its internal tables, so directives like "%a 3000"
- are not needed when specifying large scanners.
- X.IP -
- The scanning routine generated by flex is declared using the macro
- X.B YY_DECL.
- By redefining this macro you can change the routine's name and
- its calling sequence. For example, you could use:
- X.nf
- X
- X #undef YY_DECL
- X #define YY_DECL float lexscan( a, b ) float a, b;
- X
- X.fi
- to give it the name
- X.I lexscan,
- returning a float, and taking two floats as arguments.
- X.IP -
- flex generates
- X.B #line
- directives mapping lines in the output to
- their origin in the input file.
- X.IP -
- You can put multiple actions on the same line, separated with
- semi-colons. With lex, the following
- X.nf
- X
- X foo handle_foo(); return 1;
- X
- X.fi
- is truncated to
- X.nf
- X
- X foo handle_foo();
- X
- X.fi
- flex does not truncate the action. Actions that are not enclosed in
- braces are terminated at the end of the line.
- X.IP -
- Actions can be begun with
- X.B %{
- and terminated with
- X.B %}.
- In this case, flex does not count braces to figure out where the
- action ends - actions are terminated by the closing
- X.B %}.
- This feature is useful when the enclosed action has extraneous
- braces in it (usually in comments or inside inactive #ifdef's)
- that throw off the brace-count.
- X.IP -
- All of the scanner actions (e.g.,
- X.B ECHO, yywrap ...)
- except the
- X.B unput()
- and
- X.B input()
- routines,
- are written as macros, so they can be redefined if necessary
- without requiring a separate library to link to.
- X.SH FILES
- X.TP
- X.I flex.skel
- skeleton scanner
- X.TP
- X.I flex.fastskel
- skeleton scanner for -f and -F
- X.TP
- X.I flexskelcom.h
- common definitions for skeleton files
- X.TP
- X.I flexskeldef.h
- definitions for compressed skeleton file
- X.TP
- X.I fastskeldef.h
- definitions for -f, -F skeleton file
- X.SH "SEE ALSO"
- X.LP
- lex(1)
- X.LP
- M. E. Lesk and E. Schmidt,
- X.I LEX - Lexical Analyzer Generator
- X.SH AUTHOR
- Vern Paxson, with the help of many ideas and much inspiration from
- Van Jacobson. Original version by Jef Poskanzer. Fast table
- representation is a partial implementation of a design done by Van
- Jacobson. The implementation was done by Kevin Gong and Vern Paxson.
- X.LP
- Thanks to the many flex beta-testers, especially Casey Leedom,
- Nick Christopher, Chris Faylor, Eric Goldman, Craig Leres, Mohamed el Lozy,
- Esmond Pitt, Jef Poskanzer, and Dave Tallman. Thanks to John Gilmore,
- Bob Mulcahy,
- Rich Salz, and Richard Stallman for help with various distribution headaches.
- X.LP
- Send comments to:
- X.nf
- X
- X Vern Paxson
- X Real Time Systems
- X Bldg. 46A
- X Lawrence Berkeley Laboratory
- X 1 Cyclotron Rd.
- X Berkeley, CA 94720
- X
- X (415) 486-6411
- X
- X vern@lbl-{csam,rtsg}.arpa
- X ucbvax!lbl-csam.arpa!vern
- X
- X.fi
- X.SH DIAGNOSTICS
- X.LP
- X.I flex scanner jammed -
- a scanner compiled with
- X.B -s
- has encountered an input string which wasn't matched by
- any of its rules.
- X.LP
- X.I flex input buffer overflowed -
- a scanner rule matched a string long enough to overflow the
- scanner's internal input buffer (as large as
- X.B BUFSIZ
- in "/usr/include/stdio.h"). You can edit
- X.I flexskelcom.h
- and increase
- X.B YY_BUF_SIZE
- and
- X.B YY_MAX_LINE
- to increase this limit.
- X.LP
- X.I REJECT used and scanner was
- X.I not generated using -r -
- just like it sounds. Your scanner uses
- X.B REJECT.
- You must run flex on the scanner description using the
- X.B -r
- flag.
- X.LP
- X.I old-style lex command ignored -
- the flex input contains a lex command (e.g., "%n 1000") which
- is being ignored.
- X.SH BUGS
- X.LP
- Use of unput() or input() trashes the current yytext and yyleng.
- X.LP
- Use of unput() to push back more text than was matched can
- result in the pushed-back text matching a beginning-of-line ('^')
- rule even though it didn't come at the beginning of the line.
- X.LP
- Nulls are not allowed in flex inputs or in the inputs to
- scanners generated by flex. Their presence generates fatal
- errors.
- X.LP
- Do not mix trailing context with the '|' operator used to
- specify that multiple rules use the same action. That is,
- avoid constructs like:
- X.nf
- X
- X foo/bar |
- X bletch |
- X bugprone { ... }
- X
- X.fi
- They can result in subtle mismatches. This is actually not
- a problem if there is only one rule
- using trailing context and it is the first in the list (so the
- above example will actually work okay). The
- problem is due to fall-through in the action switch statement,
- causing non-trailing-context rules to execute the
- trailing-context code of their fellow rules. This should
- be fixed, as it's a nasty bug and not obvious. The proper fix is
- for flex to spit out a FLEX_TRAILING_CONTEXT_USED #define and then
- have the backup logic in a separate table which is consulted for
- each rule-match, rather than as part of the rule action. The
- place to do the tweaking is in add_accept() - any kind soul want
- to be a hero?
- X.LP
- The pattern:
- X.nf
- X
- X x{3}
- X
- X.fi
- is considered to be variable-length for the purposes of trailing
- context, even though it has a clear fixed length.
- X.LP
- Due to both buffering of input and read-ahead, you cannot intermix
- calls to, for example,
- X.B getchar()
- with flex rules and expect it to work. Call
- X.B input()
- instead.
- X.LP
- The total table entries listed by the
- X.B -v
- flag excludes the number of table entries needed to determine
- what rule has been matched. The number of entries is equal
- to the number of DFA states if the scanner was not compiled
- with
- X.B -r,
- and greater than the number of states if it was.
- X.LP
- The scanner run-time speeds have not been optimized as much
- as they deserve. Van Jacobson's work shows that the can go quite
- a bit faster still.
- END_OF_FILE
- if test 15558 -ne `wc -c <'flex.1'`; then
- echo shar: \"'flex.1'\" unpacked with wrong size!
- fi
- # end of 'flex.1'
- fi
- if test -f 'flexdef.h' -a "${1}" != "-c" ; then
- echo shar: Will not clobber existing file \"'flexdef.h'\"
- else
- echo shar: Extracting \"'flexdef.h'\" \(17327 characters\)
- sed "s/^X//" >'flexdef.h' <<'END_OF_FILE'
- X/*
- X * Definitions for flex.
- X *
- X * modification history
- X * --------------------
- X * 02b kg, vp 30sep87 .added definitions for fast scanner; misc. cleanup
- X * 02a vp 27jun86 .translated into C/FTL
- X */
- X
- X/*
- X * Copyright (c) 1987, the University of California
- X *
- X * The United States Government has rights in this work pursuant to
- X * contract no. DE-AC03-76SF00098 between the United States Department of
- X * Energy and the University of California.
- X *
- X * This program may be redistributed. Enhancements and derivative works
- X * may be created provided the new works, if made available to the general
- X * public, are made available for use by anyone.
- X */
- X
- X#include <stdio.h>
- X
- X#ifdef SV
- X#include <string.h>
- X#define bzero(s, n) memset((char *)(s), '\000', (unsigned)(n))
- X#else
- X#include <strings.h>
- X#endif
- X
- char *sprintf(); /* keep lint happy */
- X
- X
- X/* maximum line length we'll have to deal with */
- X#define MAXLINE BUFSIZ
- X
- X/* maximum size of file name */
- X#define FILENAMESIZE 1024
- X
- X#define min(x,y) (x < y ? x : y)
- X#define max(x,y) (x > y ? x : y)
- X
- X#define true 1
- X#define false 0
- X
- X
- X#ifndef DEFAULT_SKELETON_FILE
- X#define DEFAULT_SKELETON_FILE "flex.skel"
- X#endif
- X
- X#ifndef FAST_SKELETON_FILE
- X#define FAST_SKELETON_FILE "flex.fastskel"
- X#endif
- X
- X/* special nxt[] action number for the "at the end of the input buffer" state */
- X/* note: -1 is already taken by YY_NEW_FILE */
- X#define END_OF_BUFFER_ACTION -3
- X/* action number for default action for fast scanners */
- X#define DEFAULT_ACTION -2
- X
- X/* special chk[] values marking the slots taking by end-of-buffer and action
- X * numbers
- X */
- X#define EOB_POSITION -1
- X#define ACTION_POSITION -2
- X
- X/* number of data items per line for -f output */
- X#define NUMDATAITEMS 10
- X
- X/* number of lines of data in -f output before inserting a blank line for
- X * readability.
- X */
- X#define NUMDATALINES 10
- X
- X/* transition_struct_out() definitions */
- X#define TRANS_STRUCT_PRINT_LENGTH 15
- X
- X/* returns true if an nfa state has an epsilon out-transition slot
- X * that can be used. This definition is currently not used.
- X */
- X#define FREE_EPSILON(state) \
- X (transchar[state] == SYM_EPSILON && \
- X trans2[state] == NO_TRANSITION && \
- X finalst[state] != state)
- X
- X/* returns true if an nfa state has an epsilon out-transition character
- X * and both slots are free
- X */
- X#define SUPER_FREE_EPSILON(state) \
- X (transchar[state] == SYM_EPSILON && \
- X trans1[state] == NO_TRANSITION) \
- X
- X/* maximum number of NFA states that can comprise a DFA state. It's real
- X * big because if there's a lot of rules, the initial state will have a
- X * huge epsilon closure.
- X */
- X#define INITIAL_MAX_DFA_SIZE 750
- X#define MAX_DFA_SIZE_INCREMENT 750
- X
- X/* array names to be used in generated machine. They're short because
- X * we write out one data statement (which names the array) for each element
- X * in the array.
- X */
- X
- X#define ALIST 'l' /* points to list of rules accepted for a state */
- X#define ACCEPT 'a' /* list of rules accepted for a state */
- X#define ECARRAY 'e' /* maps input characters to equivalence classes */
- X#define MATCHARRAY 'm' /* maps equivalence classes to meta-equivalence classes */
- X#define BASEARRAY 'b' /* "base" array */
- X#define DEFARRAY 'd' /* "default" array */
- X#define NEXTARRAY 'n' /* "next" array */
- X#define CHECKARRAY 'c' /* "check" array */
- X
- X/* NIL must be 0. If not, its special meaning when making equivalence classes
- X * (it marks the representative of a given e.c.) will be unidentifiable
- X */
- X#define NIL 0
- X
- X#define JAM -1 /* to mark a missing DFA transition */
- X#define NO_TRANSITION NIL
- X#define UNIQUE -1 /* marks a symbol as an e.c. representative */
- X#define INFINITY -1 /* for x{5,} constructions */
- X
- X/* size of input alphabet - should be size of ASCII set */
- X#define CSIZE 127
- X
- X#define INITIAL_MAXCCLS 100 /* max number of unique character classes */
- X#define MAXCCLS_INCREMENT 100
- X
- X/* size of table holding members of character classes */
- X#define INITIAL_MAX_CCL_TBL_SIZE 500
- X#define MAX_CCL_TBL_SIZE_INCREMENT 250
- X
- X#define INITIAL_MNS 2000 /* default maximum number of nfa states */
- X#define MNS_INCREMENT 1000 /* amount to bump above by if it's not enough */
- X
- X#define INITIAL_MAX_DFAS 1000 /* default maximum number of dfa states */
- X#define MAX_DFAS_INCREMENT 1000
- X
- X#define JAMSTATE -32766 /* marks a reference to the state that always jams */
- X
- X/* enough so that if it's subtracted from an NFA state number, the result
- X * is guaranteed to be negative
- X */
- X#define MARKER_DIFFERENCE 32000
- X#define MAXIMUM_MNS 31999
- X
- X/* maximum number of nxt/chk pairs for non-templates */
- X#define INITIAL_MAX_XPAIRS 2000
- X#define MAX_XPAIRS_INCREMENT 2000
- X
- X/* maximum number of nxt/chk pairs needed for templates */
- X#define INITIAL_MAX_TEMPLATE_XPAIRS 2500
- X#define MAX_TEMPLATE_XPAIRS_INCREMENT 2500
- X
- X#define SYM_EPSILON 0 /* to mark transitions on the symbol epsilon */
- X
- X#define INITIAL_MAX_SCS 40 /* maximum number of start conditions */
- X#define MAX_SCS_INCREMENT 40 /* amount to bump by if it's not enough */
- X
- X#define ONE_STACK_SIZE 500 /* stack of states with only one out-transition */
- X#define SAME_TRANS -1 /* transition is the same as "default" entry for state */
- X
- X/* the following percentages are used to tune table compression:
- X
- X * the percentage the number of out-transitions a state must be of the
- X * number of equivalence classes in order to be considered for table
- X * compaction by using protos
- X */
- X#define PROTO_SIZE_PERCENTAGE 15
- X
- X/* the percentage the number of homogeneous out-transitions of a state
- X * must be of the number of total out-transitions of the state in order
- X * that the state's transition table is first compared with a potential
- X * template of the most common out-transition instead of with the first
- X * proto in the proto queue
- X */
- X#define CHECK_COM_PERCENTAGE 50
- X
- X/* the percentage the number of differences between a state's transition
- X * table and the proto it was first compared with must be of the total
- X * number of out-transitions of the state in order to keep the first
- X * proto as a good match and not search any further
- X */
- X#define FIRST_MATCH_DIFF_PERCENTAGE 10
- X
- X/* the percentage the number of differences between a state's transition
- X * table and the most similar proto must be of the state's total number
- X * of out-transitions to use the proto as an acceptable close match
- X */
- X#define ACCEPTABLE_DIFF_PERCENTAGE 50
- X
- X/* the percentage the number of homogeneous out-transitions of a state
- X * must be of the number of total out-transitions of the state in order
- X * to consider making a template from the state
- X */
- X#define TEMPLATE_SAME_PERCENTAGE 60
- X
- X/* the percentage the number of differences between a state's transition
- X * table and the most similar proto must be of the state's total number
- X * of out-transitions to create a new proto from the state
- X */
- X#define NEW_PROTO_DIFF_PERCENTAGE 20
- X
- X/* the percentage the total number of out-transitions of a state must be
- X * of the number of equivalence classes in order to consider trying to
- X * fit the transition table into "holes" inside the nxt/chk table.
- X */
- X#define INTERIOR_FIT_PERCENTAGE 15
- X
- X/* size of region set aside to cache the complete transition table of
- X * protos on the proto queue to enable quick comparisons
- X */
- X#define PROT_SAVE_SIZE 2000
- X
- X#define MSP 50 /* maximum number of saved protos (protos on the proto queue) */
- X
- X/* maximum number of out-transitions a state can have that we'll rummage
- X * around through the interior of the internal fast table looking for a
- X * spot for it
- X */
- X#define MAX_XTIONS_FOR_FULL_INTERIOR_FIT 4
- X
- X/* number that, if used to subscript an array, has a good chance of producing
- X * an error; should be small enough to fit into a short
- X */
- X#define BAD_SUBSCRIPT -32767
- X
- X/* absolute value of largest number that can be stored in a short, with a
- X * bit of slop thrown in for general paranoia.
- X */
- X#define MAX_SHORT 32766
- X
- X
- X/* Declarations for global variables. */
- X
- X/* variables for symbol tables:
- X * sctbl - start-condition symbol table
- X * ndtbl - name-definition symbol table
- X * ccltab - character class text symbol table
- X */
- X
- struct hash_entry
- X {
- X struct hash_entry *prev, *next;
- X char *name;
- X char *str_val;
- X int int_val;
- X } ;
- X
- typedef struct hash_entry *hash_table[];
- X
- X#define NAME_TABLE_HASH_SIZE 101
- X#define START_COND_HASH_SIZE 101
- X#define CCL_HASH_SIZE 101
- X
- extern struct hash_entry *ndtbl[NAME_TABLE_HASH_SIZE];
- extern struct hash_entry *sctbl[START_COND_HASH_SIZE];
- extern struct hash_entry *ccltab[CCL_HASH_SIZE];
- X
- X
- X/* variables for flags:
- X * printstats - if true (-v), dump statistics
- X * syntaxerror - true if a syntax error has been found
- X * eofseen - true if we've seen an eof in the input file
- X * ddebug - if true (-d), make a "debug" scanner
- X * trace - if true (-T), trace processing
- X * spprdflt - if true (-s), suppress the default rule
- X * interactive - if true (-I), generate an interactive scanner
- X * caseins - if true (-i), generate a case-insensitive scanner
- X * useecs - if true (-ce flag), use equivalence classes
- X * fulltbl - if true (-cf flag), don't compress the DFA state table
- X * usemecs - if true (-cm flag), use meta-equivalence classes
- X * reject - if true (-r flag), generate tables for REJECT macro
- X * fullspd - if true (-F flag), use Jacobson method of table representation
- X * gen_line_dirs - if true (i.e., no -L flag), generate #line directives
- X */
- X
- extern int printstats, syntaxerror, eofseen, ddebug, trace, spprdflt;
- extern int interactive, caseins, useecs, fulltbl, usemecs, reject;
- extern int fullspd, gen_line_dirs;
- X
- X
- X/* variables used in the flex input routines:
- X * datapos - characters on current output line
- X * dataline - number of contiguous lines of data in current data
- X * statement. Used to generate readable -f output
- X * skelfile - fd of the skeleton file
- X * yyin - input file
- X * temp_action_file - temporary file to hold actions
- X * action_file_name - name of the temporary file
- X * infilename - name of input file
- X * linenum - current input line number
- X */
- X
- extern int datapos, dataline, linenum;
- extern FILE *skelfile, *yyin, *temp_action_file;
- extern char *infilename;
- extern char *action_file_name;
- X
- X
- X/* variables for stack of states having only one out-transition:
- X * onestate - state number
- X * onesym - transition symbol
- X * onenext - target state
- X * onedef - default base entry
- X * onesp - stack pointer
- X */
- X
- extern int onestate[ONE_STACK_SIZE], onesym[ONE_STACK_SIZE];
- extern int onenext[ONE_STACK_SIZE], onedef[ONE_STACK_SIZE], onesp;
- X
- X
- X/* variables for nfa machine data:
- X * current_mns - current maximum on number of NFA states
- X * accnum - number of the last accepting state
- X * firstst - physically the first state of a fragment
- X * lastst - last physical state of fragment
- X * finalst - last logical state of fragment
- X * transchar - transition character
- X * trans1 - transition state
- X * trans2 - 2nd transition state for epsilons
- X * accptnum - accepting number
- X * lastnfa - last nfa state number created
- X */
- X
- extern int current_mns;
- extern int accnum, *firstst, *lastst, *finalst, *transchar;
- extern int *trans1, *trans2, *accptnum, lastnfa;
- X
- X
- X/* variables for protos:
- X * numtemps - number of templates created
- X * numprots - number of protos created
- X * protprev - backlink to a more-recently used proto
- X * protnext - forward link to a less-recently used proto
- X * prottbl - base/def table entry for proto
- X * protcomst - common state of proto
- X * firstprot - number of the most recently used proto
- X * lastprot - number of the least recently used proto
- X * protsave contains the entire state array for protos
- X */
- X
- extern int numtemps, numprots, protprev[MSP], protnext[MSP], prottbl[MSP];
- extern int protcomst[MSP], firstprot, lastprot, protsave[PROT_SAVE_SIZE];
- X
- X
- X/* variables for managing equivalence classes:
- X * numecs - number of equivalence classes
- X * nextecm - forward link of Equivalence Class members
- X * ecgroup - class number or backward link of EC members
- X * nummecs - number of meta-equivalence classes (used to compress
- X * templates)
- X * tecfwd - forward link of meta-equivalence classes members
- X * tecbck - backward link of MEC's
- X */
- X
- extern int numecs, nextecm[CSIZE + 1], ecgroup[CSIZE + 1], nummecs;
- extern int tecfwd[CSIZE + 1], tecbck[CSIZE + 1];
- X
- X
- X/* variables for start conditions:
- X * lastsc - last start condition created
- X * current_max_scs - current limit on number of start conditions
- X * scset - set of rules active in start condition
- X * scbol - set of rules active only at the beginning of line in a s.c.
- X * scxclu - true if start condition is exclusive
- X * actvsc - stack of active start conditions for the current rule
- X */
- X
- extern int lastsc, current_max_scs, *scset, *scbol, *scxclu, *actvsc;
- X
- X
- X/* variables for dfa machine data:
- X * current_max_dfa_size - current maximum number of NFA states in DFA
- X * current_max_xpairs - current maximum number of non-template xtion pairs
- X * current_max_template_xpairs - current maximum number of template pairs
- X * current_max_dfas - current maximum number DFA states
- X * lastdfa - last dfa state number created
- X * nxt - state to enter upon reading character
- X * chk - check value to see if "nxt" applies
- X * tnxt - internal nxt table for templates
- X * base - offset into "nxt" for given state
- X * def - where to go if "chk" disallows "nxt" entry
- X * tblend - last "nxt/chk" table entry being used
- X * firstfree - first empty entry in "nxt/chk" table
- X * dss - nfa state set for each dfa
- X * dfasiz - size of nfa state set for each dfa
- X * dfaacc - accepting set for each dfa state (or accepting number, if
- X * -r is not given)
- X * accsiz - size of accepting set for each dfa state
- X * dhash - dfa state hash value
- X * todo - queue of DFAs still to be processed
- X * todo_head - head of todo queue
- X * todo_next - next available entry on todo queue
- X * numas - number of DFA accepting states created; note that this
- X * is not necessarily the same value as accnum, which is the analogous
- X * value for the NFA
- X * numsnpairs - number of state/nextstate transition pairs
- X * jambase - position in base/def where the default jam table starts
- X * jamstate - state number corresponding to "jam" state
- X * end_of_buffer_state - end-of-buffer dfa state number
- X */
- X
- extern int current_max_dfa_size, current_max_xpairs;
- extern int current_max_template_xpairs, current_max_dfas;
- extern int lastdfa, lasttemp, *nxt, *chk, *tnxt;
- extern int *base, *def, tblend, firstfree, **dss, *dfasiz;
- extern union dfaacc_union
- X {
- X int *dfaacc_set;
- X int dfaacc_state;
- X } *dfaacc;
- extern int *accsiz, *dhash, *todo, todo_head, todo_next, numas;
- extern int numsnpairs, jambase, jamstate;
- extern int end_of_buffer_state;
- X
- X/* variables for ccl information:
- X * lastccl - ccl index of the last created ccl
- X * current_maxccls - current limit on the maximum number of unique ccl's
- X * cclmap - maps a ccl index to its set pointer
- X * ccllen - gives the length of a ccl
- X * cclng - true for a given ccl if the ccl is negated
- X * cclreuse - counts how many times a ccl is re-used
- X * current_max_ccl_tbl_size - current limit on number of characters needed
- X * to represent the unique ccl's
- X * ccltbl - holds the characters in each ccl - indexed by cclmap
- X */
- X
- extern int lastccl, current_maxccls, *cclmap, *ccllen, *cclng, cclreuse;
- extern int current_max_ccl_tbl_size;
- extern char *ccltbl;
- X
- X
- X/* variables for miscellaneous information:
- X * starttime - real-time when we started
- X * endtime - real-time when we ended
- X * nmstr - last NAME scanned by the scanner
- X * sectnum - section number currently being parsed
- X * nummt - number of empty nxt/chk table entries
- X * hshcol - number of hash collisions detected by snstods
- X * dfaeql - number of times a newly created dfa was equal to an old one
- X * numeps - number of epsilon NFA states created
- X * eps2 - number of epsilon states which have 2 out-transitions
- X * num_reallocs - number of times it was necessary to realloc() a group
- X * of arrays
- X * tmpuses - number of DFA states that chain to templates
- X * totnst - total number of NFA states used to make DFA states
- X * peakpairs - peak number of transition pairs we had to store internally
- X * numuniq - number of unique transitions
- X * numdup - number of duplicate transitions
- X * hshsave - number of hash collisions saved by checking number of states
- X */
- X
- extern char *starttime, *endtime, nmstr[MAXLINE];
- extern int sectnum, nummt, hshcol, dfaeql, numeps, eps2, num_reallocs;
- extern int tmpuses, totnst, peakpairs, numuniq, numdup, hshsave;
- X
- char *allocate_array(), *reallocate_array();
- X
- X#define allocate_integer_array(size) \
- X (int *) allocate_array( size, sizeof( int ) )
- X
- X#define reallocate_integer_array(array,size) \
- X (int *) reallocate_array( (char *) array, size, sizeof( int ) )
- X
- X#define allocate_integer_pointer_array(size) \
- X (int **) allocate_array( size, sizeof( int * ) )
- X
- X#define allocate_dfaacc_union(size) \
- X (union dfaacc_union *) \
- X allocate_array( size, sizeof( union dfaacc_union ) )
- X
- X#define reallocate_integer_pointer_array(array,size) \
- X (int **) reallocate_array( (char *) array, size, sizeof( int * ) )
- X
- X#define reallocate_dfaacc_union(array, size) \
- X (union dfaacc_union *) reallocate_array( (char *) array, size, sizeof( union dfaacc_union ) )
- X
- X#define allocate_character_array(size) allocate_array( size, sizeof( char ) )
- X
- X#define reallocate_character_array(array,size) \
- X reallocate_array( array, size, sizeof( char ) )
- X
- X
- X/* used to communicate between scanner and parser. The type should really
- X * be YYSTYPE, but we can't easily get our hands on it.
- X */
- extern int yylval;
- END_OF_FILE
- if test 17327 -ne `wc -c <'flexdef.h'`; then
- echo shar: \"'flexdef.h'\" unpacked with wrong size!
- fi
- # end of 'flexdef.h'
- fi
- if test -f 'nfa.c' -a "${1}" != "-c" ; then
- echo shar: Will not clobber existing file \"'nfa.c'\"
- else
- echo shar: Extracting \"'nfa.c'\" \(13633 characters\)
- sed "s/^X//" >'nfa.c' <<'END_OF_FILE'
- X/* nfa - NFA construction routines */
- X
- X/*
- X * Copyright (c) 1987, the University of California
- X *
- X * The United States Government has rights in this work pursuant to
- X * contract no. DE-AC03-76SF00098 between the United States Department of
- X * Energy and the University of California.
- X *
- X * This program may be redistributed. Enhancements and derivative works
- X * may be created provided the new works, if made available to the general
- X * public, are made available for use by anyone.
- X */
- X
- X#include "flexdef.h"
- X
- X/* add_accept - add an accepting state to a machine
- X *
- X * synopsis
- X *
- X * add_accept( mach, headcnt, trailcnt );
- X *
- X * the global ACCNUM is incremented and the new value becomes mach's
- X * accepting number. if headcnt or trailcnt is non-zero then the machine
- X * recognizes a pattern with trailing context. headcnt is the number of
- X * characters in the matched part of the pattern, or zero if the matched
- X * part has variable length. trailcnt is the number of trailing context
- X * characters in the pattern, or zero if the trailing context has variable
- X * length.
- X */
- X
- add_accept( mach, headcnt, trailcnt )
- int mach, headcnt, trailcnt;
- X
- X {
- X int astate;
- X
- X fprintf( temp_action_file, "case %d:\n", ++accnum );
- X
- X if ( headcnt > 0 || trailcnt > 0 )
- X { /* do trailing context magic to not match the trailing characters */
- X fprintf( temp_action_file,
- X "YY_DO_BEFORE_SCAN; /* undo effects of setting up yytext */\n" );
- X
- X if ( headcnt > 0 )
- X {
- X int head_offset = headcnt - 1;
- X
- X if ( fullspd || fulltbl )
- X /* with the fast skeleton, yy_c_buf_p points to the *next*
- X * character to scan, rather than the one that was last
- X * scanned
- X */
- X ++head_offset;
- X
- X if ( head_offset > 0 )
- X fprintf( temp_action_file, "yy_c_buf_p = yy_b_buf_p + %d;\n",
- X head_offset );
- X
- X else
- X fprintf( temp_action_file, "yy_c_buf_p = yy_b_buf_p;\n" );
- X }
- X
- X else
- X fprintf( temp_action_file, "yy_c_buf_p -= %d;\n", trailcnt );
- X
- X fprintf( temp_action_file, "YY_DO_BEFORE_ACTION; /* set up yytext again */\n" );
- X }
- X
- X line_directive_out( temp_action_file );
- X
- X /* hang the accepting number off an epsilon state. if it is associated
- X * with a state that has a non-epsilon out-transition, then the state
- X * will accept BEFORE it makes that transition, i.e., one character
- X * too soon
- X */
- X
- X if ( transchar[finalst[mach]] == SYM_EPSILON )
- X accptnum[finalst[mach]] = accnum;
- X
- X else
- X {
- X astate = mkstate( SYM_EPSILON );
- X accptnum[astate] = accnum;
- X mach = link_machines( mach, astate );
- X }
- X }
- X
- X
- X/* copysingl - make a given number of copies of a singleton machine
- X *
- X * synopsis
- X *
- X * newsng = copysingl( singl, num );
- X *
- X * newsng - a new singleton composed of num copies of singl
- X * singl - a singleton machine
- X * num - the number of copies of singl to be present in newsng
- X */
- X
- int copysingl( singl, num )
- int singl, num;
- X
- X {
- X int copy, i;
- X
- X copy = mkstate( SYM_EPSILON );
- X
- X for ( i = 1; i <= num; ++i )
- X copy = link_machines( copy, dupmachine( singl ) );
- X
- X return ( copy );
- X }
- X
- X
- X/* dumpnfa - debugging routine to write out an nfa
- X *
- X * synopsis
- X * int state1;
- X * dumpnfa( state1 );
- X */
- X
- dumpnfa( state1 )
- int state1;
- X
- X {
- X int sym, tsp1, tsp2, anum, ns;
- X
- X fprintf( stderr, "\n\n********** beginning dump of nfa with start state %d\n",
- X state1 );
- X
- X /* we probably should loop starting at firstst[state1] and going to
- X * lastst[state1], but they're not maintained properly when we "or"
- X * all of the rules together. So we use our knowledge that the machine
- X * starts at state 1 and ends at lastnfa.
- X */
- X
- X /* for ( ns = firstst[state1]; ns <= lastst[state1]; ++ns ) */
- X for ( ns = 1; ns <= lastnfa; ++ns )
- X {
- X fprintf( stderr, "state # %4d\t", ns );
- X
- X sym = transchar[ns];
- X tsp1 = trans1[ns];
- X tsp2 = trans2[ns];
- X anum = accptnum[ns];
- X
- X fprintf( stderr, "%3d: %4d, %4d", sym, tsp1, tsp2 );
- X
- X if ( anum != NIL )
- X fprintf( stderr, " [%d]", anum );
- X
- X fprintf( stderr, "\n" );
- X }
- X
- X fprintf( stderr, "********** end of dump\n" );
- X }
- X
- X
- X/* dupmachine - make a duplicate of a given machine
- X *
- X * synopsis
- X *
- X * copy = dupmachine( mach );
- X *
- X * copy - holds duplicate of mach
- X * mach - machine to be duplicated
- X *
- X * note that the copy of mach is NOT an exact duplicate; rather, all the
- X * transition states values are adjusted so that the copy is self-contained,
- X * as the original should have been.
- X *
- X * also note that the original MUST be contiguous, with its low and high
- X * states accessible by the arrays firstst and lastst
- X */
- X
- int dupmachine( mach )
- int mach;
- X
- X {
- X int i, state, init, last = lastst[mach], state_offset;
- X
- X for ( i = firstst[mach]; i <= last; ++i )
- X {
- X state = mkstate( transchar[i] );
- X
- X if ( trans1[i] != NO_TRANSITION )
- X {
- X mkxtion( finalst[state], trans1[i] + state - i );
- X
- X if ( transchar[i] == SYM_EPSILON && trans2[i] != NO_TRANSITION )
- X mkxtion( finalst[state], trans2[i] + state - i );
- X }
- X
- X accptnum[state] = accptnum[i];
- X }
- X
- X state_offset = state - i + 1;
- X
- X init = mach + state_offset;
- X firstst[init] = firstst[mach] + state_offset;
- X finalst[init] = finalst[mach] + state_offset;
- X lastst[init] = lastst[mach] + state_offset;
- X
- X return ( init );
- X }
- X
- X
- X/* link_machines - connect two machines together
- X *
- X * synopsis
- X *
- X * new = link_machines( first, last );
- X *
- X * new - a machine constructed by connecting first to last
- X * first - the machine whose successor is to be last
- X * last - the machine whose predecessor is to be first
- X *
- X * note: this routine concatenates the machine first with the machine
- X * last to produce a machine new which will pattern-match first first
- X * and then last, and will fail if either of the sub-patterns fails.
- X * FIRST is set to new by the operation. last is unmolested.
- X */
- X
- int link_machines( first, last )
- int first, last;
- X
- X {
- X if ( first == NIL )
- X return ( last );
- X
- X else if ( last == NIL )
- X return ( first );
- X
- X else
- X {
- X mkxtion( finalst[first], last );
- X finalst[first] = finalst[last];
- X lastst[first] = max( lastst[first], lastst[last] );
- X firstst[first] = min( firstst[first], firstst[last] );
- X
- X return ( first );
- X }
- X }
- X
- X
- X/* mkbranch - make a machine that branches to two machines
- X *
- X * synopsis
- X *
- X * branch = mkbranch( first, second );
- X *
- X * branch - a machine which matches either first's pattern or second's
- X * first, second - machines whose patterns are to be or'ed (the | operator)
- X *
- X * note that first and second are NEITHER destroyed by the operation. Also,
- X * the resulting machine CANNOT be used with any other "mk" operation except
- X * more mkbranch's. Compare with mkor()
- X */
- X
- int mkbranch( first, second )
- int first, second;
- X
- X {
- X int eps;
- X
- X if ( first == NO_TRANSITION )
- X return ( second );
- X
- X else if ( second == NO_TRANSITION )
- X return ( first );
- X
- X eps = mkstate( SYM_EPSILON );
- X
- X mkxtion( eps, first );
- X mkxtion( eps, second );
- X
- X return ( eps );
- X }
- X
- X
- X/* mkclos - convert a machine into a closure
- X *
- X * synopsis
- X * new = mkclos( state );
- X *
- X * new - a new state which matches the closure of "state"
- X */
- X
- int mkclos( state )
- int state;
- X
- X {
- X return ( mkopt( mkposcl( state ) ) );
- X }
- X
- X
- X/* mkopt - make a machine optional
- X *
- X * synopsis
- X *
- X * new = mkopt( mach );
- X *
- X * new - a machine which optionally matches whatever mach matched
- X * mach - the machine to make optional
- X *
- X * notes:
- X * 1. mach must be the last machine created
- X * 2. mach is destroyed by the call
- X */
- X
- int mkopt( mach )
- int mach;
- X
- X {
- X int eps;
- X
- X if ( ! SUPER_FREE_EPSILON(finalst[mach]) )
- X {
- X eps = mkstate( SYM_EPSILON );
- X mach = link_machines( mach, eps );
- X }
- X
- X /* can't skimp on the following if FREE_EPSILON(mach) is true because
- X * some state interior to "mach" might point back to the beginning
- X * for a closure
- X */
- X eps = mkstate( SYM_EPSILON );
- X mach = link_machines( eps, mach );
- X
- X mkxtion( mach, finalst[mach] );
- X
- X return ( mach );
- X }
- X
- X
- X/* mkor - make a machine that matches either one of two machines
- X *
- X * synopsis
- X *
- X * new = mkor( first, second );
- X *
- X * new - a machine which matches either first's pattern or second's
- X * first, second - machines whose patterns are to be or'ed (the | operator)
- X *
- X * note that first and second are both destroyed by the operation
- X * the code is rather convoluted because an attempt is made to minimize
- X * the number of epsilon states needed
- X */
- X
- int mkor( first, second )
- int first, second;
- X
- X {
- X int eps, orend;
- X
- X if ( first == NIL )
- X return ( second );
- X
- X else if ( second == NIL )
- X return ( first );
- X
- X else
- X {
- X /* see comment in mkopt() about why we can't use the first state
- X * of "first" or "second" if they satisfy "FREE_EPSILON"
- X */
- X eps = mkstate( SYM_EPSILON );
- X
- X first = link_machines( eps, first );
- X
- X mkxtion( first, second );
- X
- X if ( SUPER_FREE_EPSILON(finalst[first]) &&
- X accptnum[finalst[first]] == NIL )
- X {
- X orend = finalst[first];
- X mkxtion( finalst[second], orend );
- X }
- X
- X else if ( SUPER_FREE_EPSILON(finalst[second]) &&
- X accptnum[finalst[second]] == NIL )
- X {
- X orend = finalst[second];
- X mkxtion( finalst[first], orend );
- X }
- X
- X else
- X {
- X eps = mkstate( SYM_EPSILON );
- X
- X first = link_machines( first, eps );
- X orend = finalst[first];
- X
- X mkxtion( finalst[second], orend );
- X }
- X }
- X
- X finalst[first] = orend;
- X return ( first );
- X }
- X
- X
- X/* mkposcl - convert a machine into a positive closure
- X *
- X * synopsis
- X * new = mkposcl( state );
- X *
- X * new - a machine matching the positive closure of "state"
- X */
- X
- int mkposcl( state )
- int state;
- X
- X {
- X int eps;
- X
- X if ( SUPER_FREE_EPSILON(finalst[state]) )
- X {
- X mkxtion( finalst[state], state );
- X return ( state );
- X }
- X
- X else
- X {
- X eps = mkstate( SYM_EPSILON );
- X mkxtion( eps, state );
- X return ( link_machines( state, eps ) );
- X }
- X }
- X
- X
- X/* mkrep - make a replicated machine
- X *
- X * synopsis
- X * new = mkrep( mach, lb, ub );
- X *
- X * new - a machine that matches whatever "mach" matched from "lb"
- X * number of times to "ub" number of times
- X *
- X * note
- X * if "ub" is INFINITY then "new" matches "lb" or more occurrences of "mach"
- X */
- X
- int mkrep( mach, lb, ub )
- int mach, lb, ub;
- X
- X {
- X int base, tail, copy, i;
- X
- X base = copysingl( mach, lb - 1 );
- X
- X if ( ub == INFINITY )
- X {
- X copy = dupmachine( mach );
- X mach = link_machines( mach, link_machines( base, mkclos( copy ) ) );
- X }
- X
- X else
- X {
- X tail = mkstate( SYM_EPSILON );
- X
- X for ( i = lb; i < ub; ++i )
- X {
- X copy = dupmachine( mach );
- X tail = mkopt( link_machines( copy, tail ) );
- X }
- X
- X mach = link_machines( mach, link_machines( base, tail ) );
- X }
- X
- X return ( mach );
- X }
- X
- X
- X/* mkstate - create a state with a transition on a given symbol
- X *
- X * synopsis
- X *
- X * state = mkstate( sym );
- X *
- X * state - a new state matching sym
- X * sym - the symbol the new state is to have an out-transition on
- X *
- X * note that this routine makes new states in ascending order through the
- X * state array (and increments LASTNFA accordingly). The routine DUPMACHINE
- X * relies on machines being made in ascending order and that they are
- X * CONTIGUOUS. Change it and you will have to rewrite DUPMACHINE (kludge
- X * that it admittedly is)
- X */
- X
- int mkstate( sym )
- int sym;
- X
- X {
- X if ( ++lastnfa >= current_mns )
- X {
- X if ( (current_mns += MNS_INCREMENT) >= MAXIMUM_MNS )
- X lerrif( "input rules are too complicated (>= %d NFA states)",
- X current_mns );
- X
- X ++num_reallocs;
- X
- X transchar = reallocate_integer_array( transchar, current_mns );
- X trans1 = reallocate_integer_array( trans1, current_mns );
- X trans2 = reallocate_integer_array( trans2, current_mns );
- X accptnum = reallocate_integer_array( accptnum, current_mns );
- X firstst = reallocate_integer_array( firstst, current_mns );
- X finalst = reallocate_integer_array( finalst, current_mns );
- X lastst = reallocate_integer_array( lastst, current_mns );
- X }
- X
- X transchar[lastnfa] = sym;
- X trans1[lastnfa] = NO_TRANSITION;
- X trans2[lastnfa] = NO_TRANSITION;
- X accptnum[lastnfa] = NIL;
- X firstst[lastnfa] = lastnfa;
- X finalst[lastnfa] = lastnfa;
- X lastst[lastnfa] = lastnfa;
- X
- X /* fix up equivalence classes base on this transition. Note that any
- X * character which has its own transition gets its own equivalence class.
- X * Thus only characters which are only in character classes have a chance
- X * at being in the same equivalence class. E.g. "a|b" puts 'a' and 'b'
- X * into two different equivalence classes. "[ab]" puts them in the same
- X * equivalence class (barring other differences elsewhere in the input).
- X */
- X
- X if ( sym < 0 )
- X {
- X /* we don't have to update the equivalence classes since that was
- X * already done when the ccl was created for the first time
- X */
- X }
- X
- X else if ( sym == SYM_EPSILON )
- X ++numeps;
- X
- X else
- X {
- X if ( useecs )
- X mkechar( sym, nextecm, ecgroup );
- X }
- X
- X return ( lastnfa );
- X }
- X
- X
- X/* mkxtion - make a transition from one state to another
- X *
- X * synopsis
- X *
- X * mkxtion( statefrom, stateto );
- X *
- X * statefrom - the state from which the transition is to be made
- X * stateto - the state to which the transition is to be made
- X */
- X
- mkxtion( statefrom, stateto )
- int statefrom, stateto;
- X
- X {
- X if ( trans1[statefrom] == NO_TRANSITION )
- X trans1[statefrom] = stateto;
- X
- X else if ( (transchar[statefrom] != SYM_EPSILON) ||
- X (trans2[statefrom] != NO_TRANSITION) )
- X flexfatal( "found too many transitions in mkxtion()" );
- X
- X else
- X { /* second out-transition for an epsilon state */
- X ++eps2;
- X trans2[statefrom] = stateto;
- X }
- X }
- END_OF_FILE
- if test 13633 -ne `wc -c <'nfa.c'`; then
- echo shar: \"'nfa.c'\" unpacked with wrong size!
- fi
- # end of 'nfa.c'
- fi
- echo shar: End of archive 3 \(of 5\).
- cp /dev/null ark3isdone
- MISSING=""
- for I in 1 2 3 4 5 ; do
- if test ! -f ark${I}isdone ; then
- MISSING="${MISSING} ${I}"
- fi
- done
- if test "${MISSING}" = "" ; then
- echo You have unpacked all 5 archives.
- rm -f ark[1-9]isdone
- else
- echo You still need to unpack the following archives:
- echo " " ${MISSING}
- fi
- ## End of shell archive.
- exit 0
-